Three simultaneous speech recognition by integration of active audition and face recognition for humanoid
نویسندگان
چکیده
This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Humanoid audition system consists of sound separation, face recognition and ASR. Sound sources are separated by an active direction-pass filter (ADPF), which extracts sounds from a specified direction in real-time. Since features of sounds separated by ADPF vary according to the sound direction, ASR uses multiple directionand speaker-dependent acoustic models. The system integrates ASR results by using the sound direction and speaker information by face recognition as well as confidence measure of ASR results to select the best one. The resulting system improves word recognition rates against three simultaneous utterances.
منابع مشابه
Robot recognizes three simultaneous speech by active audition
AbmacIRobots should listen lo and mognire speeehes with their own ears under noisy environments and simultaneous speeches to attain smooth commuoicatioos with people in a real world. This paper presents three simultaneous speech recognition based on active audition which integrates audition with motion. Our mbot audition system eonsbts of three modules a real-time human tracking system, an acti...
متن کاملImprovement of three simultaneous speech recognition by using AV integration and scattering theory for humanoid
This paper presents improvement of recognition of three simultaneous speeches for a humanoid robot with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable d...
متن کاملSimultaneous Speech Recognition Based on Automatic Missing Feature Mask Generation by Integrating Sound Source Separation
Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech reco...
متن کاملModelling of Eyeball with Pan/Tilt Mechanism and Intelligent Face Recognition Using Local Binary Pattern Operator
This paper describes the vision system for a humanoid robot, which includes the mechanism that controls eyeball orientation and blinking process. Along with the mechanism designed, the orientation of the camera, integrated with controlling servomotors. This vision system is a bio-mimic, which is designed to match the size of human eye. This prototype runs face recognition and identifies, match...
متن کاملImproving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation
Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Compone...
متن کامل